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ABSTRACT 

We present a new algorithm to generate a random (unclustered) version of an magnitude 
limited observational galaxy redshift catalogue. It takes into account both galaxy evolution 
and the perturbing effects of large scale structure. The key to the algorithm is a maximum 
likelihood (ML) method for jointly estimating both the luminosity function (LF) and the over- 
density as a function of redshift. The random catalogue algorithm then works by cloning each 
galaxy in the original catalogue, with the number of clones determined by the ML solution. 
Each of these cloned galaxies is then assigned a random redshift uniformly distributed over 
the accessible survey volume, taking account of the survey magnitude limit(s) and, optionally, 
both luminosity and number density evolution. The resulting random catalogues, which can 
be employed in traditional estimates of galaxy clustering, make fuller use of the information 
available in the original catalogue and hence are superior to simply fitting a functional form 
to the observed redshift distribution. They are particularly well suited to studies of the depen- 
dence of galaxy clustering on galaxy properties as each galaxy in the random catalogue has 
the same list of attributes as measured for the galaxies in the genuine catalogue. The deriva- 
tion of the joint overdensity and LF estimator reveals the limit in which the ML estimate 
reduces to the standard 1/U max LF estimate, namely when one makes the prior assumption 
that the are no fluctuations in the radial overdensity. The new ML estimator can be viewed 
as a generalization of the 1/U max estimate in which V mllx is replaced by a density corrected 

T^dc,max 

Key words: galaxies: luminosity function, large-scale structure of Universe 
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& . 1 INTRODUCTION 

Studies of galaxy clustering as a function of the galaxy properties 
are placing increasingly powerful constraints on models of galaxy 
formation. For instance, the quantification of the dependence of the 
strength of galaxy clustering on luminosity and colour (Norberg 
et al. 2002; Zehavi et al. 2005) constrains how the distribution in 
mass of the dark matter halos that host the galaxies depends on 
luminosity and colour. This information, in turn, places very useful 
constraints on models of galaxy formation (e.g. Kim et al. 2009). 
Such techniques are being extended to new wavelengths (e.g. Guo 
et al. 201 1) and higher redshifts (e.g. Coil et al. 2008). 

Measuring the galaxy correlation function usually involves 
counting galaxy pairs and comparing to the expectation for an un- 
clustered or random catalogue (Hamilton 1993; Landy & Szalay 
1993). If one has a very large galaxy redshift survey then the red- 
shift used for the random catalogue can be determined fairly ac- 
curately by fitting some assumed functional form to the observed 
distribution. However, this is not ideal if the survey is not large or 
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one wants to subdivide it into smaller samples in bins of luminos- 
ity or colour. In such cases one can artificially suppress the mea- 
sured clustering by over fitting random fluctuations in the redshift 
distribution. An alternative method is to predict the galaxy red- 
shift distribution from an estimate of the galaxy luminosity func- 
tion (LF) and the flux and other selection limits of the survey (e.g. 
Cole et al. 2005). The redshift distribution derived by this tech- 
nique is less susceptible to distortions from density fluctuations as 
one can use estimators of the galaxy LF that are independent of 
the galaxy density (Sandage, Tammann & Yahil 1979; Efstathiou, 
Ellis & Peterson 1988). Also, one predicts not only the redshift, 
but also the luminosity of each galaxy in the random catalogue 
and so a single random catalogue can be used to estimate galaxy 
clustering as a function of luminosity. However if one wants to ex- 
tend this technique so that one can measure galaxy clustering as a 
function of other properties, e.g. colour and surface brightness, one 
has the more complicated task of first estimating a multi-variate 
luminosity-colour-surface brightness distribution function. 

We develop a new algorithm for generating a random galaxy 
catalogue that corresponds to a given observed catalogue defined 
by a simple flux limit. This is a maximum likelihood estimator for 
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the LF, $(L), in which, like the standard l/\/ max (Schmidt 1968; 
Felten 1976) estimator, $(-£/) reduces to a weighted sum over the 
galaxies with luminosity L, but unlike 1/V max explicitly accounts 
for fluctuations in the galaxy density with redshift. As each ob- 
served galaxy contributes linearly to this estimated LF, this means 
that a random catalogue with a consistent LF can be generated by 
simply cloning galaxies from the observed catalogue, with a rate 
which we derive from a maximum likelihood analysis, and redis- 
tributing them uniformly over the volume in which they would sat- 
isfy the survey selection criteria. As each galaxy in the random cat- 
alogue is a clone of an observed galaxy it carries with it all the 
measured properties of that galaxy. Hence, provided they can be 
modified for the change in redshift (e.g. k-correcting luminosities), 
the resulting random catalogue has all the properties of the orig- 
inal and can be used to study clustering as a function of any of 
those properties. This technique should be particularly applicable 
to multi-wavelength surveys such as GAMA (Driver et al. 2011) 
and its overlap with H-ATLAS (Eales et al. 2010), 6dF (Jones et al. 
2009), zCOSMOS (Lilly et al. 2007) and future redshift surveys 
designed to probe galaxy evolution. 

In Section 2 we develop a joint maximum likelihood estima- 
tor for an assumed non-evolving LF and the run of overdensity as 
a function of redshift. We, also, show how the LF estimator relates 
to the standard l/l/ max estimator. Section 3 extends this estimator 
to include galaxy evolution. In Section 4 we show how the estima- 
tor can be extended to provide a simple algorithm for generating a 
random galaxy catalogue. The method is tested and illustrated with 
mock data in Section 5 and we conclude in Section 6. 



2 LUMINOSITY FUNCTION ESTIMATION 

The commonly used STY (Sandage, Tammann & Yahil 1979) and 
EEP (Efstathiou, Ellis & Peterson 1988) maximum likelihood es- 
timators of the galaxy luminosity function (LF) assume the prob- 
ability of a galaxy having luminosity in the interval L — dL/2 to 
L + dL/2 in a volume element d 3 x centred at position x can be 
factorized as 



P{L, x) dL d 3 x = 0(L)p(x) dL d 3 x. 



(1) 



They then construct estimators that are independent of the density, 
p(x), by factoring out its dependence. 

Thus they start with the following conditional probability 



p a = 



4>{L) dL 



(2) 



that in an apparent magnitude limited catalogue a galaxy a at red- 
shift z a will have luminosity L a 

The STY and EEP methods differ in that STY assume a para- 
metric (Schechter function) form for the LF, while EEP simply 
adopt a stepwise (binned) description of the LF. In both cases the 
derivation of the LF estimator follows by forming the likelihood, 
which is the total probability for the whole galaxy sample given the 
model parameters, 



C = U a p a , 



(3) 



and maximising this likelihood (or its logarithm) over the model 
parameters (bin values in the case of EEP). 

If we are interested in estimating both the LF and the spheri- 
cally averaged density field we can instead start with the joint prob- 
ability 
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A(z c 



x dV(z a ) 
' dz 



<t>{L 



SHz)%S?^ (z) <t>( L )dLdz 



(4) 



of finding a galaxy at redshift z a with luminosity L a in an apparent 
magnitude limited sample. Here dV /dz is the differential of the 
survey volume with redshift and A(z) is the galaxy overdensity 
(averaged over a radial bin) at redshift z. Here we are assuming 
that there is no redshift evolution of the luminosity function and 
hence p(x) varies only due to density fluctuations. Adopting binned 
estimates of both the luminosity function cf>i and overdensity A p we 
can write this probability as 



p a = 



J2 P V P A P D{z a \z p ) 4>i D{La\Li) 



(5) 



Here the sum over p (later also g) runs over redshift bins with 
V p being the volume and A p the galaxy overdensity of the bin. 
The sum over i (later also j) runs over the bins in the luminosity 
function with 4>i being equal to (j>(L) dL for that bin. The func- 
tions D(z a \z p ) and D(L a \Li) represent simple binning functions 
which are unity if galaxy a falls in the corresponding redshift and 
luminosity bin and zero otherwise. Similarly S(L r p nlri \Li) is a step- 
function which is unity if the minimum luminosity L™ ln required 
for a galaxy to make it into the magnitude limited sample at the 
redshift of bin p is fainter than the luminosity Li of that bin. Using 
this notation we can write 



In 



C = (fo^VpAp D(z a \z p ) + ln^2<j> t D{La\L l ) 

a p i 

-ln^^A^^S^L,)). (6) 



For the maximum likelihood solution, the derivatives of In C with 
respect to bin values A 9 and <f>j will be zero. Hence we have 



dlnC 
dA a 



and 
d\n£ 



= =V 



V q D( Za \z q ) 



2. 



D[L a \Lj) 



(7) 



E'£.&£>(L a |L i ) 



j: p VpApS(Lf»\L,) 
2^ V VpAp^iSiL^LiY 



(8) 



The meaning of the various terms in these equations can be made 
more explicit by adopting the following notation. Let the estimate 
of the number of galaxies in the survey based on the values of (j>i 



and A p be 



JVto 



iS(Lf n \Li). 



(9) 



Let the number of galaxies falling in each luminosity and redshift 
bin be iVj and N p respectively and let 



(10) 



be the predicted mean galaxy number density in redshift bin q based 
on the estimated LF and assuming the mean density , i.e. A, = 1. 
Finally let 
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^dc.max = A p V p SiL^Lj), 



(11) 



which is a density corrected version of the normal y max in which 
the volume elements, V p , are weighted by the estimated overdensi- 
ties, A p . 

Using this notation we can rewrite the two constraint equa- 
tions as 

o=M_«i and = !k_ N ~T m ~, 02) 

V q A q N tot <t>j JVtot 

which rearrange to give the coupled equations 



N q JV tot 
V„n„ Ntot 



and 



Nj Ntot 



(13) 



To the extent to which the maximum likelihood model is a 
good description of the data 7V to t = iVtot and so these equations 
simplify to quite intuitive estimators 



V q h q 



and 



'3 -r rdc,max 
3 



(14) 



The first of these equations simply says that the estimate of the 
overdensity is the measured density divided by that predicted by 
the LF, while the second equation is equivalent to 



(15) 



with the sum being over galaxies within that luminosity bin, i.e. the 
normal 1/V max estimator, but with \/ max replaced by y dc ' max . 

We note that this maximum likelihood estimate of the LF is 
equivalent to the standard 1/V max estimator if one makes the prior 
assumption that A q = 1, i.e. that there are no fluctuations in the 
radial galaxy density. 

Choloniewski (1986) derived the same estimator of the LF us- 
ing a different approach in which it was assumed that the number of 
galaxies in a given luminosity and redshift bin were drawn from a 
Poisson distribution. Our derivation shows that the estimator does 
not depend on the details of the assumed statistical distribution. 
The same density estimator was derived by maximum likelihood 
in section 8 of Saunders et al. (1990). They also stated that an im- 
proved estimate of the LF could be made by making the same den- 
sity correction to y max 5 though they did not derive this result via 
maximum likelihood. Another related analysis is that of Heyl et al. 
(1997). They followed similar steps but choose not to make the 
separability assumption of equation (1) so as to be able to directly 
probe evolution of the shape of the LF using wide redshift bins. 

Before detailing our simple algorithm for generating a random 
catalogue that is consistent with the LF given by equation (15), we 
will generalize this result to take account of redshift evolution. The 
resulting algorithm, described in Section 4, can then be applied to 
surveys that span a wide range of redshifts. 



3 ALLOWING FOR REDSHIFT EVOLUTION 

First let us consider the case where one has external knowledge 
of the evolution of the galaxy population. For instance, one might 
have evolutionary corrections for each galaxy or an average for the 
population based on fitting stellar population synthesis models (e.g. 
Bruzual & Chariot 2003; Blanton & Roweis 2007) to the observed 



galaxy colours. One could also have a pre-imposed model for den- 
sity evolution, e.g. that the amplitude of the galaxy luminosity func- 
tion, <J>*, varies with redshift as <E>*(z) = P(z)<t*(Q). In this case 
the only changes that are needed to the above estimators are: 

(i) when computing the redshift range over which a given galaxy 
satisfies the catalogue selection criteria include the e-correction 
along with the k-correction and 

(ii) include the factor P(z), by which $* evolves, in the defini- 
tion of T/ dc ' max . 



Thus, we redefine V A 



for galaxy a used in equation (15) to be 



^dc.max = A p P p V p S(Lf n \L a ), 



(16) 



which simply represents and integral over the survey volume 
weighted by the combined factor A(z)P(z) with limits set by the 
redshift range over which galaxy a would satisfy the survey selec- 
tion criteria. 

If one does not have foreknowledge of the evolution one can 
instead parameterise the evolution and use the survey data to con- 
strain its parameters by an extension of the maximum likelihood 
technique. For instance for the P(z) model of <E>* evolution intro- 
duced above, equation (6) becomes 

ln£ = ^2^1n^2v p P p A p D(z a \z p ) + \n^2(j>,D(L a \L t ) 

a p i 

-ln^VpPpAp^&^LflLi)). (17) 

p i 

Here the parametric form of P(z) might simply be P(z) = 1 + az 
with a being the evolution parameter we wish to determine. The 
method is easily generalized to more parameters. As P p and A p 
always appear as a pair in this likelihood function they are degener- 
ate, i.e. we are unable distinguish evolution in the number density 
of galaxies from a redshift dependent change in the overdensity. 
If, however, we are able to specify the expected amplitude of the 
density fluctuations then this will enable the likelihood analysis to 
distinguish fluctuations from smooth evolution 1 . If the redshift bins 
are sufficiently large in volume we can make a simple estimate of 
the expected fluctuations in the galaxy overdensity using the inte- 
gral J3 = j £(r)r 2 dr (assumed to be a constant when integrated 
to scales ;> 10ft _1 Mpc) of the galaxy correlation function, £(r) 
(Peebles 1980). The resulting expected variance in A p is 

2 1 + 4irh p J 3 
° P = z—r, , (18) 

n p V p 

with the second term enhancing the variance above the Poisson 
value because galaxy positions are correlated and tend to come in 
clumps of 4TvhJ 3 galaxies at a time. Assuming the density fluc- 
tuations are Gaussian distributed with this variance and including 
this as a prior probability which multiplies our likelihood function, 



V = CxP D 



, we can replace equation (17) with the following 



equation for the logarithm of the posterior probability (to within an 
unimportant additive constant) 

lnP = ^2 (ln^2v p P p A p D(z a \z p ) + \n^2<f>iD(L a \Li) 



1 If the estimate of the variance of the density fluctuations is inaccurate or 
the function P(z) is given too much freedom then this may lead to bias 
in the recovered evolution parameters, but for the smooth evolution models 
considered here we find no evidence of bias. 
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(Ag - I) 2 
2a 2 



(19) 



The final term breaks the degeneracy between P p and A p and so al- 
lows us to solve for the evolution parameter. In some instances, e.g. 
for a small survey in which density evolution is inevitably poorly 
constrained, it may be beneficial to place a Gaussian prior 

T^prior(^) = 



■ exp(-a 2 /2cr 2 ) 



(20) 



'271-0-0 

on the density evolution parameter. 

The final modification is to use a Lagrange multiplier, u, to 
impose the constraint that, in the absence of density fluctuations, 
the predicted number of galaxies, J2 q n q V q , equals the number 
in the genuine catalogue, /V t ot- In the simple case presented in 
Section 2 this is not necessary as the likelihood expression of 
equation (6) is invariant under the transformation <j>i — > 9<fn and 
A p — > A p /9. Thus, in that case one can simply impose this nor- 
malization constraint after having found the ML solution. However, 
the introduction of last term in equation (19) has broken this sym- 
metry and so to maximise equation (19) subject to this constraint 
we need instead to maximise 



lni 



A = InV - n^2{n q V g - N tot ). 



(21) 



Following the same steps that led from equation (6) to (12), 
but now also setting the derivatives 



rflnA 







and 



dlnA 



= 0, 



(22) 



da dfi 

where p, is the Lagrange multiplier and a is the parameter of the 
evolution model P(z), leads to the following ML solution, 

A„- 1 







= 



dlnP q 
da 



= 



1 



(23) 



(24) 



(25) 



(26) 



Here we have generalized the earlier notation to include the P(z) 
model so that 



n q = P q ^<t>i S(Lf n \Li), 

i 

^c,™* = J2A p P p V p S(Lf"\L j ), 



(27) 



(28) 



(iii) Evaluate n q using 



PqVqS(L™ in , La) 

y'max 
a 



yn 



ydc.max + ^max 



, (30) 



which follows from evaluating equation (27) using the estimate of 
(f>j given by equation (24). 2 

(iv) Substitute this estimate of h q into equation (23) to solve for 
the A . 

(v) Solve for the number density evolution parameter, a, by find- 
ing the root of equation (25). 3 

(vi) Now repeat this process from step (i) until the A g and the 
P q converge. 

In the iterative process described above we never explicitly 
evaluate the luminosity function, $(1/), though one could do this 
at any stage by simply evaluating 



4>(L) = 



V dc ' max (L a ) + ^V max (L a ) 



(31) 



which follows from equation (24). Hence although we derived the 
method by considering a binned estimate of the luminosity function 
this binning does not enter in any way in determining the parame- 
ters A 9 and a or into the predicted redshift distribution, h q V q , they 
imply. 

One could deal with luminosity evolution in an analogous 
way. First define the e-correction term in the standard way so that 
absolute, M, and apparent, m, magnitudes are related by 



M = m- 51og 10 d lum (z) - k(z) - e(z), 



(32) 



where di um is the luminosity distance and k(z) the k-correction 
(see e.g. Hogg et al. 2002). Then parameterize the e-correction (or 
its deviation from a default individual e-correction for each galaxy) 
as e.g. e(z) — uz and maximize the posterior probability with 
respect to the parameter u. This yields the constraint equation 



dlnP 

du 



= = 



^-~> du 



-J2v p P p (A p + n)4>(Ll 



j dL p ln 
du 



u 



(33) 



where the last term comes from assuming a Gaussian prior on the 
evolution parameter. The other terms depend on u through the 
implicit dependence of the luminosities L a and L™ 111 on the e- 
correction via the relationship between the inferred absolute mag- 
nitude, the observed apparent magnitude, m a and redshift z a , 



m a - 51og lo di um (0 a ) - k(z a ) - e(z a ), 



(34) 



and made use of the result that if the model accurately describes the 
data then 



JVtot = E p pVp A p 2 ^ S( . L f n \ L ') = ^tot- 



(29) 



These equations can be solved efficiently by an iterative 
method. Starting with A q = 1 and P q = 1 (or a prior guess for 
the evolution parameter a). 

(i) Evaluate \/ dc . max an d \/ max for each galaxy using the cur- 



rent values of A q and P a 



I v m 

(ii) Find the value of ji such that ^ yJC IUl ;^ 



which is achieved easily using the Newton-Raphson method. 



= 1, 



2 We have written the equation in this form as if we then sum over the 
redshift bins, q, it is straightforward to see that the choice of fi from step (ii) 
ensures that equation (26) is satisfied. In practice, we find -C 1 and that 
setting fj, = makes very little difference to the resulting LF and redshift 
distribution. 

3 Here we assume that as n q oc P q / ™9^9> w hich is appropriate if (j>j 
and A q are being held fixed and the normalization constraint, equation (26), 
is being maintained. The approximate scaling of n q used in step (v) does not 
have to be exact. We use it as a fast way of estimating n q at any value of the 
evolution parameter a from the existing estimate we have at a = a' from 
step (iii). Once we have iterated these equations to the point they converge 
then a R3 a' and so these scaling factors all tend to unity. The approximation 
used in this scaling only effects the speed of convergence. 
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and through the dependence of the limiting absolute magnitude at 
redshift z p on the apparent magnitude limit of the survey, rafaint, 



Mfaint = mf a int - 5 log 10 di um (%>) - k(z p ) - e(z p ) . 



(35) 



Hence, u can be found in an iterative way, updating u by finding 
the root of equation (33) in the same way as we update a by finding 
the root of equation (25). Implementing this modified alogrithm re- 
quires a smooth luminosity binning scheme, as in Efstathiou et al. 
(1988), so that the derivative dNj/du is well defined. Although we 
have successfully implemented such a scheme we prefer to present 
results in which we use the simpler iterative algorithm detailed 
above. This is sufficiently fast that we can repeat it for different 
fixed values of the e-correction (it), iterating to the final solution 
for each value of u, and then search over the values of u to find the 
value which maximises the logarithm of the posterior probability 



In-P = 



( ln J2 V p A p D(z a \z p ) + In D{L a \L t ) 

ol p i 

-ln^y p A p ^^S(Lf n |^)) 



E 



(Ap - l) 2 
2o% 



2ol 2al 



(36) 



The initial terms come from equation (19) and the terms on the 
final line of this equation come from assumed Gaussian priors on 
the evolution parameters a and u. The term on the second line is 
effectively constant as it involves only the total number of galaxies 
predicted by the model. Thus, to within an unimportant additive 
constant we can evaluate this expression as 

In-P = Y (in^VpAp D(z a \z p ) + ln^ 0; .D(L a |L;)) 



E 



(Ap - l) 2 
2a 2 p 



a 

2^1 



u 

2^J' 



or equivalently in terms of the binned quantities as 
In-P = ^iVpln^A^ + ^jViln^i) 



E 



(Ap - l) 2 
2al 



a 

2°i 



u 

2ol' 



(38) 



Thus for each trial value of the luminosity evolution parameter u 
one evaluates this expression using the values of <pi and A p that 
result from the iterative solution of equations (23) to (26) and then 
simply selects the most probable model. 



4 GENERATING A RANDOM CATALOGUE 

The LF estimates we have derived in Sections 2 and 3 are both sim- 
ply weighted sums over the galaxies of that luminosity. This feature 
means they are very well suited for generating random catalogues. 
Rather than having to estimate the LF and then compute the num- 
ber of galaxies expected at a given redshift in the random catalogue 
as an integral over one can instead carry out a weighted du- 

plication of the galaxies in the original catalogue with each being 
redistributed in redshift. 

The key to the algorithm is equation (30). The left hand 
side of this equation is the predicted number of galaxies in the 
redshift bin z q of the random catalogue. The right hand side of 



the equation we can interpret as saying each galaxy in the orig- 
inal catalogue has a weight w a = vdc , max+ ^ ymax and because 

V a max = p qV q S(L™ in , L a ) we see that the first term indicates 
that this weight is distributed amongst the redshift bins according to 
the fraction of its y max that falls within each bin. This interpreta- 
tion of equation (30) leads to a very simple Monte Carlo algorithm 
for generating a random catalogue, i.e. the galaxy catalogue one 
would expect if there were no galaxy clustering. 

To generate a random catalogue with approximately iVtimcs 
as many galaxies as the original we proceed as follows. Loop 
over the galaxies in the original catalogue and, for each one, place 
■WtimesWo, duplicates 4 into the random catalogue, with the red- 
shift of each duplicate being randomly selected within the volume 
ymax ^ s access i]ji e to that galaxy. These weights correct for 
the fact that galaxies of a given luminosity may be over- or under- 
represented in the original catalogue as a result of density fluctua- 
tions within the volume probed by the catalogue. The definition of 
V max used here should include the P(z) factor, but not A(z), i.e. 



t rmax / max\ 

V (z a ) 



Jo 



dV 

dz 



P(z)dz, 



(39) 



where 2™ ax is the redshift at which the galaxy a would drop out- 
side the survey selection criteria. A fast algorithm to achieve this 
is to first generate a lookup table for V max (,z). Then, for the clone 
of each galaxy, a, one generates a uniform random variable, s, in 
the interval [0, 1] and uses the lookup table to assign it the redshift 
at which V max (z) = sV max (C ax )- The redshift dependent prop- 
erties of the galaxy such as apparent magnitude must be adjusted 
using the distance modulus, k- and e-corrections to this assigned 
redshift. The angular position of the galaxy can be independently 
randomly chosen within the angular footprint of the survey. The re- 
sult is a random catalogue with a smooth redshift distribution and 
luminosity function consistent with the maximum likelihood value 



(37) 

given by equation (31).' 



5 RESULTS 

As a first test of our algorithm we have analysed a mock galaxy cat- 
alogue that has been constructed from the Virgo Millennium Sim- 
ulation (Springel et al. 2005). The simulation was populated with 
galaxies using the Bower et al. (2006) version of the GALFORM 
semi-analytic model. 6 

In Fig. 1 we show the redshift distribution of a shallow, r < 
17.5 and z < 0.2, portion of a 1000 square degree region of this 
mock catalogue. The redshift distribution is very structured as a re- 
sult of realistic large scale structure - voids, filaments and clusters 
- in the three dimensional galaxy distribution (Springel et al. 2005). 
The smooth curves in the upper panel of Fig. 1 show the redshift 
distributions of our corresponding random catalogues. The dotted 



4 Although this ratio is not in general an integer one can round up or down 
with probabilities chosen such that the mean is the required value. 

5 A related random catalogue algorithm was explored in Cresswell (2010), 
but without applying the density dependent weights, w a that are required 
by this maximum likelihood derivation. Cresswell (2010) used the resulting 
redshift distribution as an alternative to LF based prediction employed in 
Cresswell & Percival (2009) when quantifying scale dependent bias for red 
and blue galaxies in SDSS. 

6 This catalogue is a prototype of set of mock Pan-STARRS galaxy cata- 
logues available at https://psl-durham.dur.ac.uk/mocks. 
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data 

first iteration 

— mid iteration 

— last iteration 




Figure 1. The upper panel compares the redshift distribution of the data 
from a mock catalogue with the predicted smooth redshift distributions of 
selected iterations of the random catalogue. The first iteration is shown by 
the dotted (green) curve and a subsequent and final iteration by the dashed 
(blue) and solid (red) curves respectively. The lower panel shows the over- 
density in redshift shells, A(z), of the mock catalogue compared to the dif- 
ferent iterations of the random catalogue. In both panels the dashed (blue) 
curves are almost coincident with the solid (red) curves. 



first iteration (standard 1/V max ) 

— mid iteration 

— last iteration 



M r -51og 10 h 

Figure 2. The r-band luminosity function of selected iterations of the ran- 
dom catalogue. The estimate from the first iteration, shown by the dotted 
(green) curve, is simply the standard 1/V max estimate of the luminosity 
function. Subsequent iterations, shown by the dashed (blue) and almost co- 
incident solid (red) curves, rapidly converge. 



(green) curve is the result of the simple algorithm in which the cat- 
alogue galaxies are just randomized within the accessible volume, 
F m ", within which the galaxy could be detected and meet the se- 
lection criteria of the catalogue. In this process a simple r-band 
k-correction, 



was assumed for all galaxies, this being typical of the fc-correction 
given by Blanton & Roweis (2007) for r-band selected galaxies in 
the SDDS survey. The evolution, e-correction, was assumed to be 
negligible. Even without reference to the other models it is clear 
that this redshift distribution has been biased by the presence of 
large scale structure. For instance the overdensity at z ~ 0.04 re- 
sults in a shoulder in the redshift distribution of the random cata- 
logue. 

The two remaining and almost identical curves in the upper 
panel of Fig. 1 show the redshift distributions of the random cata- 
logues that result from taking the y max based estimate as a starting 
point and applying the iterative procedure described in Section 4 to 
find the solutions to equations (14). The same fc-correction and no 
evolution were assumed as in the y max based estimate. This proce- 
dure rapidly converges to a stable random catalogue with a smooth 
redshift distribution which is unbiased by the large scale structure. 
The lower panel of Fig. 1 shows the overdensity of the mock cata- 
logue as a function of redshift, estimated as the ratio of the redshift 
distribution of the mock catalogue to that of the random catalogue. 
It is clear that the y max based estimate, like methods which sim- 
ply fit the observed redshift distribution, underestimates the true 
amplitude of the density fluctuations and would lead to biased esti- 
mates of galaxy correlation functions and other large scale structure 
statistics. 

The estimated luminosity functions corresponding to these 
different random catalogues are shown in Fig. 2. We again see ex- 
cellent convergence in estimates resulting from our iterative pro- 
cedure. In this case, the l/y max estimate, which is our starting 
point, is biased high at intermediate magnitudes by the overdensity 
at z « 0.04. 

To test the method further we set up a deeper galaxy catalogue 
with a known luminosity function and explicit luminosity and den- 
sity evolution. To achieve this we first set up a galaxy catalogue 
with no spatial clustering by sampling the evolving Schechter lu- 
minosity function 



$(L) = P(z) 3>* 



L 



U{z) 



exp 



L 



U{z) 



(41) 



in a standard flat cosmology with density parameter f2 m = 0.3 and 
cosmological constant Qa = 0.7. For the parameters of this evolv- 
ing Schechter function we adopted $„ = 1.49 x 10" 2 h 3 Mpc" 3 , 
a — 1.05, P(z) = exp(0.18z) and L,(z) equivalent to character- 
istic r-band absolute magnitude M* = —20.37 at z = with an 
assumed e-correction term 



e(z) = -1.622. 



(42) 



k(z) = 0.87z + 1.38z 



(40) 



The fc-correction was again given by equation (40). These choices 
are compatible with the parameterization of the SDSS r-band lu- 
minosity estimated by Blanton et al. (2003), though they chose to 
work in a magnitude system referenced to 2 = 0.1. The resulting 
redshift distribution for an r < 24 magnitude limited catalogue of 
5 square degrees is shown by the dashed (blue) line in the upper 
panel of Fig. 3, labelled "truth". 

To impose density fluctuations on the smooth redshift distribu- 
tion we divided the catalogue into redshift bins, with volumes V p , 
and for each bin generated a random density perturbation 8 P > — 1 
drawn from a truncated Gaussian with variance 4nJs/V p . Here 
we chose 471-J3 = 5000, which is appropriate for L„ galaxies 
(Hawkins et al. 2003). We then generated the catalogue with the 
redshift distribution shown by the histogram in Fig. 3 by randomly 
accepting galaxies from a D times denser version of original un- 
clustered catalogue with probability (1 + S p ) /D. The Poisson flue- 
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Figure 3. The upper panel shows two sets of redshift distributions. The up- 
per distributions are for the full population of galaxies in a 5 square degree, 
r < 24 magnitude limited survey. The lower distributions are for the sub- 
set of these galaxies with absolute magnitudes M r < —20. In both cases 
the clumpy distribution (black histograms) from the synthetic catalogue is 
compared with the smooth redshift distributions of two random catalogues 
and that of the original uniform catalogue (blue dashed curves) from which 
it was constructed. As described in the text the synthetic catalogue includes 
both luminosity and density evolution. The lower panel shows the ratio, 
A(z), of the full redshift distribution of the data to each of the random 
catalogues. The random catalogue shown by the dotted (green) curves, the 
starting point of the iterative process, is based on the y max of each galaxy 
and ignores both luminosity and density evolution. For the random cata- 
logue shown by the solid (red) curves, the iterative procedure described in 
Section 3 has been applied to determine the luminosity and density evo- 
lution parameters that maximise the posterior probability, equation (38). In 
each case the solid (red) curves are almost coincident with the (blue) dashed 
curves. The error bars shown in the lower panel are the expected level of 
fluctuations as given by equation (18). 

tuations from this sampling process combine with the imposed fluc- 
tuations, S p , to produce fluctuations consistent with the variance 
given by equation (18). 

Taking this catalogue as input we generated corresponding 
random catalogues by applying the iterative procedure described 
in Section 3 to find the solutions to equations (23) to (25) and max- 
imise the posterior probability given in equation (38). Here we as- 
sumed the density evolution to be of the form 

P(z) = exp((a + 0.18)z) (43) 

and the luminosity evolution of the form 

e(z) = -0.5z + uz (44) 

with a and u being free parameters. Hence we would hope to find 
a w 0.0 andtt « -1.12. 

As the starting point of the iterative process we assumed a — 
and u — (i.e. the default density evolution, but insufficient lumi- 
nosity evolution), with Gaussian priors of width a a — 0.05 and 
g u = 1.5. Under these assumptions the initial l/ max based esti- 
mate results in a random catalogue with the redshift distribution 
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Figure 4. Comparison of the input Schechter luminosity function with those 
recovered by y max and the iterative maximum likelihood method. For a 
fair comparison, the input Schechter function has been averaged over the 
0.25 mag width bins used in the other estimates. 



shown by the dotted (green) curve in Fig. 3. This can be seen to be 
biased high at z < 0.1 by a local overdensity and to underpredict 
the number of galaxies at z J> 0.8 due to its lack of evolution. This 
is seen more clearly in the lower panel which plots the overden- 
sity estimated as the ratio of the redshift distributions of the input 
catalogue to the random catalogue. 

The maximum likelihood random catalogue is shown by the 
solid (red) curves in Fig. 3. The converged result for the evolution 
parameters is a — 0.05 and u — —1.11, which are close to the 
true values. One does not expect to recover the exact input values 
as the density fluctuations introduce noise into the estimates. One 
could determine formal errors on all the model parameters by de- 
termining the Fisher matrix from the second derivatives of the like- 
lihood function. However, it is probably simpler, more convenient 
and more robust to determine the errors by repeating the whole pro- 
cedure on jackknife samples of the original catalogue. For a cata- 
logue of this particular size and depth it turns out that the density 
evolution parameter a is only weakly constrained and hence the 
prior on a is playing a role (i.e. a broader prior leads to a different 
a, but the resulting random catalogues are hardly distinguishable). 
In contrast, the luminosity evolution parameter, u, is tightly con- 
strained and the input value is recovered quite accurately. This is 
true provided that sufficiently narrow magnitude bins are used for 
the LF. We have found that using wide bins leads to an underes- 
timate of the degree of luminosity evolution, Broadening the un- 
derlying luminosity function by the bin width artificially boosts the 
bright end of the LF and so, just like luminosity evolution, it makes 
a tail of high redshift luminous galaxies more probable. With mag- 
nitude bins of width less than 0.5 magnitudes this effect is very 
small. 

In Fig. 3, one can see that this procedure has produced a 
smooth redshift distribution that is in accurate agreement with the 
true underlying redshift distribution from which the synthetic cata- 
logue was constructed. The redshift distributions that are shown in 
Fig. 3 for the subset of galaxies with absolute magnitudes M r < 
—20 illustrate that the random catalogue we have produced can be 
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used to model the underlying smooth redshift distribution of any 
selected subset of the data. 

We compare the input and recovered z = luminosity func- 
tions in Fig 4. We see the initial 1/V max is shifted towards bright 
magnitudes due to the incorrect luminosity function and is also bi- 
ased high at the faintest magnitudes due to the local z < 0.1 over- 
density. The maximum likelihood/maximum posterior probability 
estimate has recovered the input luminosity function very accu- 
rately. 



6 CONCLUSIONS 

We have presented a maximum likelihood estimator for the galaxy 
luminosity function which can be viewed as an extension to the 
l/V mM method (Schmidt 1968), taking into account the effect 
of density fluctuations within the volume probed by the galaxy 
catalogue. The standard U max is replaced by a density corrected 
version, \/ dc,max , that explicitly corrects for the over- or under- 
representation of galaxies of a particular luminosity in the cata- 
logue produced by large scale structure. The utility of our lumi- 
nosity function estimator is that it is a very simple and intuitive 
modification of the much used, but biased, 1/U max method. Sim- 
ilar density corrections to 1/V max have been utilised by Croton 
et al. (2005) and Baldry et al. (2006) to study the dependence of 
galaxy properties on environment and to probe the very low mass 
end of the stellar mass function (Baldry et al in preparation), but 
they used an external volume limited galaxy sample as the den- 
sity defining-population rather than computing the overdensity via 
maximum likelihood. 

We extended the maximum likelihood analysis to include arbi- 
trary parametric models of the redshift evolution of both the charac- 
teristic luminosity and number density of the galaxy population and 
described a fast iterative scheme to solve the resulting equations. 7 
Our analysis assumes a redshift catalogue which is complete to a 
single specified apparent magnitude limit. The method can be ex- 
tended to include a model of magnitude dependent incompleteness 
by incorporating an incompleteness term into the likelihood func- 
tion (e.g. see Heyl et al. 1997). To determine V rdc,max one merely 
needs to be able to determine over what range of redshift a given 
observed galaxy would continue to satisfy the survey selection cri- 
teria. Hence, in principle, it ought to possible to extend the method 
to surveys with colour selection. However, more work is required 
to see if modelling colour evolution will prove to be a barrier to 
getting sufficiently accurate models of such selection functions. 

In both the simple and more generalized versions the esti- 
mate of the galaxy luminosity function, $(£), is a simple weighted 
sum over the galaxies of luminosity L. One consequence of this 
is that we have been able to specify a simple algorithm to gen- 
erate unclustered, random galaxy catalogues consistent with this 
luminosity function by simply cloning galaxies (with a frequency 
determined by the weight) from the original catalogue and redis- 
tributing them uniformly throughout the survey volume in which 
they would be detected. At no point in this process is there any 
binning by luminosity and so no assumptions are required about 
the form or smoothness of the luminosity function. One specifies 
redshift bins, within which to estimate the radial overdensity, but 

7 A fully documented Fortran95 subroutine that implements this al- 
gorithm and generates the related random catalogue is available at 
http://astro.dur.ac.ukTcole/publications.html#software. 



the bin widths only very weakly affect the resulting redshift distri- 
bution of the random catalogue which is smooth and continuous. 
Random galaxy catalogues are widely employed when making es- 
timates of galaxy clustering. Often used alternatives such as simple 
parametric fits to the observed redshift distribution are inferior as 
they do not use the full information available in the galaxy cat- 
alogue and are prone to either over fitting density fluctuations or 
failing to capture the true shape of the selection function. These 
shortcomings can lead to underestimating the strength of cluster- 
ing on intermediate scales and overestimating the strength on the 
largest scales. A particular advantage of these new random cata- 
logues is that each galaxy they contain carries with it all the mea- 
sured properties that existed for the observed galaxy from which it 
was cloned. Hence, we expect random catalogues produced by this 
maximum likelihood technique to be particularly valuable for stud- 
ies of how galaxy clustering depends on galaxy properties such as 
colour, surface brightness, morphology or spectral features. 



ACKNOWLEDGEMENTS 

I would like to thank Jim Cresswell as one topic we discussed in 
his PhD viva was partially responsible for prompting me to revisit 
some unfinished work on luminosity function estimation that I had 
started years earlier. I thank Carlos Frenk for several useful discus- 
sions and comments on the manuscript. I also gratefully acknowl- 
edge the support of a Leverhulme Research Fellowship. This work 
was supported in part by an STFC rolling grant to the ICC. The 
calculations for this paper were performed on the ICC Cosmology 
Machine, which is part of the DiRAC Facility jointly funded by 
STFC, the Large Facilities Capital Fund of BIS, and Durham Uni- 
versity. 



REFERENCES 

Baldry I. K., Balogh M. L., Bower R. G., Glazebrook K., Nichol 
R. C, Bamford S. P., Budavari T, 2006, MNRAS, 373, 469 

Blanton M. R. et al., 2003, ApJ, 592, 819 

Blanton M. R., Roweis S., 2007, AJ, 133, 734 

Bower R. G, Benson A. J., Malbon R., Helly J. C, Frenk C. S., 
Baugh C. M., Cole S„ Lacey C. G, 2006, MNRAS, 370, 645 

Bruzual G, Chariot S., 2003, MNRAS, 344, 1000 

Choloniewski J., 1986, MNRAS, 223, 1 

Coil A. L. et al., 2008, ApJ, 672, 153 

Cole S. et al., 2005, MNRAS, 362, 505 

Cresswell J. G, 2010, Portsmouth PhD. thesis 

Cresswell J. G, Percival W. J., 2009, MNRAS, 392, 682 

Croton D. J. et al., 2005, MNRAS, 356, 1 155 

Driver S. P. et al., 2011, MNRAS, 413, 971 

Eales S. et al., 2010, PASP, 122, 499 

Efstathiou G, Ellis R. S., Peterson B. A., 1988, MNRAS, 232, 
431 

Felten J. E., 1976, ApJ, 207, 700 
Guo Q. et al., 201 1, MNRAS, 193 
Hamilton A. J. S., 1993, ApJ, 417, 19 
Hawkins E. et al., 2003, MNRAS, 346, 78 

Heyl J., Colless M., Ellis R. S., Broadhurst T, 1997, MNRAS, 
285, 613 

Hogg D. W., Baldry I. K., Blanton M. R., Eisenstein D. J., 2002, 

ArXiv: astro-ph/02 10394 
Jones D. H. et al., 2009, MNRAS, 399, 683 



Kim H„ Baugh C. M., Cole S., Frenk C. S., Benson A. J., 2009, 

MNRAS, 400, 1527 
Landy S. D., Szalay A. S., 1993, ApJ, 412, 64 
Lilly S. J. et al., 2007, ApJS, 172, 70 
Norberg P. et al., 2002, MNRAS, 332, 827 

Peebles P. J. E., 1980, The large-scale structure of the universe. 

Princeton University Press, 1980. 
Sandage A., Tammann G. A., Yahil A., 1979, ApJ, 232, 352 
Saunders W., Rowan-Robinson M., Lawrence A., Efstathiou G., 

Kaiser N., Ellis R. S., Frenk C. S., 1990, MNRAS, 242, 318 
Schmidt M., 1968, ApJ, 151, 393 
Springel V. et al., 2005, Nature, 435, 629 
Zehavi I. et al., 2005, ApJ, 630, 1 



Random Catalogues 9 



